-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Feature/grading vscode test response #943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add documentation for VSCode Integration Tests Use explicit list of tests instead of searching ot make it easier to run 1 test at time locally.
|
src/test/suite/index.ts
Outdated
| try { | ||
| // Find all test files | ||
| const files = await glob("**/**.test.js", { cwd: testsRoot }) | ||
| //const files = await glob("**/**.test.js", { cwd: testsRoot } leaving this commented out for now since we only have three tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it hurt to keep this? I can imagine someone getting confused if they want to add more tests and they don't know about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't necessarily hurt to keep this, but it makes getting set up to run a single test locally more annoying. I made sure that in the VSCODE_INTEGRATION_TESTS.md there are explicit instructions around that part. Also this allows us to potentially quickly turn a test on or off by removing it from the list without having to move or delete the test.
This set up with a list of tests is similar to how we do things in Roo Automation Mobile, but happy to swap back to a single line if that's what makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it's more intuitive to just have it run all tests without needing to add them to this list. You can always add .skip or .only to control which ones are run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change to have all tests run by default, but I am leaving the array of test files commented out with instructions as it is way easier to modify the index.ts file to only run the single test someone is working on than having to go to each test they don't want to run and add skip. I don't believe the .only function works for this setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.only seems to work for me. What do you see when you try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was worried that the runner was going to go in order so, if it grabbed the test that didn't have .only first it would still run it, but it seems like it is smart enough. I'll remove the list from index.ts and update the doc
| ({ type, text }) => type === "say" && text?.includes("software engineer"), | ||
| ), | ||
| "Did not receive expected response containing 'I am Roo in Code mode, specializing in software engineering'", | ||
| await globalThis.provider.updateGlobalState("mode", "Ask") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to use handleModeSwitch so it does the associated api config switch etc (in case we wanted to use another model to evaluate this someday)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can explore using that function. It wouldn't be difficult to also swap the model in a single line as well since we set that at the beginning of the test run in index.ts using the same mechanism.
Ideally this "Grading" portion of the test becomes a helper function that any test can call with a prompt and response/output and that handles all the necessary Roo Code settings configurations for the grading.
Modify where we log messages to the console for the test
| await new Promise((resolve) => setTimeout(resolve, interval)) | ||
| } | ||
|
|
||
| await globalThis.provider.updateGlobalState("mode", "Code") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding cleanup logic for the global state changes to prevent side effects on other tests.
| } | ||
| }) | ||
| const grade = globalThis.provider.messages.find( | ||
| ({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick, but maybe could use a regex to pull out the score?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added regex to look for the grade, it is still a little fuzzy given the variability of the response from the LLMs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something like this would be a little more DRY?
const gradeMessage = globalThis.provider.messages.find(
({ type, text }) => type === "say" && !text?.includes("Grade: (1-10)") && text?.includes("Grade:"),
)?.text
const gradeMatch = gradeMessage?.match(/Grade: (\d+)/)
const gradeNum = gradeMatch ? parseInt(gradeMatch[1]) : undefined
assert.ok(
gradeNum !== undefined && gradeNum >= 7 && gradeNum <= 10,
"Grade must be between 7 and 10",
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Add regex look for grade in in modes.test.ts
Update documentation Update modes.test.ts to use PR comment suggestion for pulling out reponse grade
src/test/VSCODE_INTEGRATION_TESTS.md
Outdated
| npm run test:integration | ||
| ``` | ||
|
|
||
| 3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a note that test.only should be removed before committing to ensure full test coverage on CI.
| 3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others. | |
| 3. If you want to run a specific test, you can use the `test.only` function in the test file. This will run only the test you specify and ignore the others. Remember to remove `test.only` before committing to ensure full test coverage on CI. |
Description
Changing how we determine if
modes.test.tspasses. We now use the Ask mode of Roo Code to determine if the output from a previous task is correct.Modified the
index.tsfile to have the option to use an explicit list of test files to make it easier to run 1 test at time locally.Also added documentation for VS Code Integration Tests
Future Enhancement: Use a different model to grade the response instead of the same model.
Test Procedure
Tested the new grading assertion locally.
Type of Change
Pre-flight Checklist
npm test) and code is formatted and linted (npm run format && npm run lint)npm run changeset(required for user-facing changes)Screenshots
n/a
Additional Notes
n/a
Important
Add grading mechanism for VSCode test responses using Roo Code's Ask mode and document integration test setup.
modes.test.ts.index.tsto allow running specific test files locally.VSCODE_INTEGRATION_TESTS.mdfor VSCode integration test setup.modes.test.tsto include grading logic for mode switching responses.task.test.tsto ensure correct handling of prompt and response.This description was created by
for 4886cb7. It will automatically update as commits are pushed.